PAPER.MSS[1,ALS] - www.SailDart.org

perm filename PAPER.MSS[1,ALS] blob sn#704184 filedate 1983-04-08 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	Some Experiments in Machine Learning Using the Game of Checkers:. III -
C00017 ENDMK
C⊗;
Some Experiments in Machine Learning Using the Game of Checkers:. III -
An Epilogue

Introduction

Some ten years has elapsed since active work has been done on the checker
program described in the two earlier papers with this same title.  This
earlier work was flawed by two assumptions that now appear to be
untenable.  It therefore seems appropiate to review this earlier work and
to try to rectify these earlier mistakes.  This present work is, however,
in the nature of an epilogue since the current (and soon to be realized)
advances in the hardware design of computers are sure to make obsolete
much of the programming effort that now has to be expended to overcome the
speed and memory limitations of existing computers.  What this paper hopes
to do is to record the present state of the art as a land mark against
which future progress can be measured.

We will briefly review the earlier work.  The reader who does not find
this review adequate might do well to refresh his memory by referring to
these earlier papers.

Two machine learning procedures were described in some detail: (1) a rote
learning procedure in which a record was kept of the board situationd
encountered in actual play together with information as to the results of
the machine analysis of the situation; this information could be referenced
at terminal board situations of each newly initiated tree search and thus,
in effect, allow the machine to look ahead further than time would
otherwise permit and (2) a generalization learning procedure in which the
program continuously re-evaluated certain learned parameters that were
used to evaluate the board positions at the treminal board positions of a
look-ahead tree search.  The rote learning procedure was characterized by
a very slow but continuous learning rate.  It was most effective in the
opening and end-game phases of the play. The generaliization learning
procedure, by way of contrast, learned at a more rapid rate but soon
approached a plateau set by certain limitations which only slowly became
apparent. It was surprising good at mid-game play but fared badly in the
opening and end-game phases.  While the generalization-learning procedure
could be applied to actual games against human opponents, it soon became
evident that the learning procedure could be greatly speeded up by using book
learning in which it was always assumed that the play, as outlined in the book,
was along the optimum path and the learning coefficients were adjusted
incrementally so as to bias the computer play in the direction of the book game.


In the earlier work, on generalization learning as reported in the first
paper, the learned parameters were the coefficients for a Linear
Polynomial.  In subsequent work, reported in the second paper, a Signature
Table scheme was used.  At the time of publication of the second paper, it
was thought that the Signsture Table scheme had remedied the major defects
of the generalization learning scheme and that with a longer learning
period this scheme would be vastly superior to the linear polynomial
scheme.  These hopes were never realized.  After an initial period in
which this schemed seemed to show real promise, it developed that even
non-master players were able to discover certain basic limitations of the
program and they were able to lead the machine down avenues of play where
it did very badly indeed.  At the time, it was not immediately obvious how
these limitations could be overcome, This, coupled with the pressure of
other work, lead to the abandonment of continuing work. It has been only
recently that methods of overcoming these limitations have either occurred
to the writer or have been suggested to him by others.

Many of the aspects of the actual playing techniques as described in (2),
as contrasted with board evaluation techniques, need little or no
revision.  Specific techniques such as Alph-Beta Pruning, Plausibility
Analysis, Multiple-Path Enhanced-Plausibility and Forward Pruning, are
still useful although the degree of dependence on them can be greatly
lessened by improvements in board evaluations.  Lacking a good evaluation
procedure, one must search the game tree along many paths and to great
depths.

What is in question is the learning techniques that have been used up to
now.  Three aspects of learning must be considered, 1)what is the nature
of the source material that the program given, 2) How this source material
is processed and stored and 3) how this learned material is used in
evaluating new situations. We will discuss these three aspects in some
detail.  

People learn in a variety of ways, 1) from their own experience, making
experiments and observing the results, 2) by watching experts and
observing what they do. This watching may be directly or by reading about
the actions of the experts as listed in books (in the case of games by
books of master play) and finally, 3) by being told by the experts
something about the thought processes that they used in governing their
behavior.  This last form of learning usually involves the direct
interaction between teacher and pupil since experts are seldom aware of
exactly how they arrive at their decisions and are often quite unable to
formalize their intuitive knowledge in any meaningful way.  It has been
the writer's experience that this discrepency between concious knowledge
and intuitive expertize is particularly marked in the case of checker
masters, particularly so in masters who have studied under other masters
and who have arrived at their final level of proficiency as a result of
years of study.

So far, this last form of learning has been essentially absent
from my experiments. This now seems to be a grevious omission and steps to
be described in this paper report on one way to rectify this deficiency.
Computer knowledge, by it essential nature cannot be intuitive knowledge
and a way must be found to work with a checker master and force him to make
his implicit knowledge explicit so that it can be stored in the computer.

As an aside, it is interesting to note that no reliance on machine
learning techniques is even now being given in the chess playing programs
that are being given so much attention. The better programs still rely
almost entirely upon speed, extending the analysis to greater and greater
depths to compensate for the almost complete lack of understanding of how
the truely expert chess player achieves his results.

In the earlier reported learning experiments, the program was given the
the basic rules of the game and some limited expert knowledge of the
factors that the expert players seem to use in their evaluation of board
situations as filtered through this programmer and transmitted to the
programm by being written into the program itself.  The program was then
exposed to either direct play against human players or presented with
stored information in the form of book games which report, basically, how
the better human players react to the specific board situations that
present themselves during play.  At no time was the program privy to the
thought processes that were used by these expert players in making their
decisions or of the relative weights that the experts assigned to the
various board properties that the program was able to evaluate.

The program was left with the task of infering the proper weights to be
assigned to these various properties of board situations by a hill
climbing technique in which it altered coefficients (or numbers stored in
siginature tables).  In any specific instance the program was forced to
credit or debit all of the various parameters more or less indiscrimately
but in such a way that on the average the weights were gradually change in
such a direction as to cause the program to play more and more like the
experts.  That the program was able to improve its play in this way is
more a triumph of brute speed than an indication of any essential merit of
this procedure.  People do learn in this way but fortunately there are
much more effective ways of learning.

So the problem seems to be to devise some method of extracting intuitive
knowledge from one or more checker masters and to make this knowledge explicit.
Attempts to get master players to justify their choice of moves during play
have, in the past, always floundered on the fact that the chosen move is
seldom a winning move.  In fact there is usually no best move but only several
moves that are judged to be less bad than some others.

It now seems that
we have been asking the wrong question and that it would be more appropiate
to ask why the less favored moves are thought to be bad.  
There are several advantages to this negative approach.  In the first place,
the answer is immediately apparent in many cases and a tabulating of the answers
will lead to a list of several reasons for not making certain moves.  Then there
are usually many more bad moves than good moves from any given board position so it
should be possible to accumulate much more data.  Another factor to consider is that
the master player will be asked to give answers to questions that he normally does
not answer and to which he may not have with time developed a pseudo-answer which
he now uses as a mneumonic aid to recall position specific knowledge for which he
has long since forgotten the real reason and has substituted an easer to remember
but otherwise quite false reason.